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Abstract 

Compressed sensing is a relatively new signal processing technique whereby the limits 
proposed by the Shannon-Nyquist theorem can be exceeded under certain conditions im¬ 
posed upon the signal. Such conditions occur in many real-world scenarios, and compressed 
sensing has emerging applications in medical imaging, big data, and statistics. Finding prac¬ 
tical matrix constructions and computationally efficient recovery algorithms for compressed 
sensing is an area of intense research interest. Many probabilistic matrix constructions have 
been proposed, and it is now well known that matrices with entries drawn from a suifable 
probabilify disfribufion are essentially optimal for compressed sensing. 

Pofenfial applicafions have motivafed fhe search for consfrucfions of sparse compressed 
sensing mafrices {i.e. mafrices confaining few non-zero enfries). Various consfrucfions have 
been proposed, and simulafions suggesf fhaf fheir performance is comparable fo fhaf of dense 
mafrices. In fhis paper exfensive simulafions are presenfed which suggesf fhaf sparsification 
leads fo a marked improvemenf in compressed sensing performance for a large class of matrix 
constructions and for many differenf recovery algorifhms. 


1 Introduction 

Compressed sensing is an emerging paradigm in signal processing, developed in a series of 
ground-breaking publications by Donoho, Candes, Romberg, Tao and their collaborators over 
the past 10 years HHISHH. Many real-world signals have the special property of being sparse - 
they can be stored much more concisely than a random signal. Instead of sampling the whole 
signal and then applying data compression algorithms, sampling and compression of sparse 
signals can be achieved simultaneously. This process requires dramatically fewer measurements 
than the number dictated by the Shannon-Nyquist Theorem, but requires complex measurements 
that are incoherent with respect to the signal. The compressed sensing paradigm has generated 
an explosion of interest over the past few years within both the mathematical and electrical 
engineering research communities. 
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A particularly significant application has been to Magnetic Resonance Imaging (MRI), for 
which compressed sensing can speed up scans by a factor of five |[23l . either allowing increased 
resolution from a given number of samples or allowing real-time imaging at clinically useful 
resolutions. A major breakthrough achieved with compressed sensing has been real-time imaging 
of the heart Il35l [24l . The US National Institute for Biomedical Imaging and Bioengineering 
published a news report in September 2014 describing compressed sensing as offering a Vast 
improvement' in paediatric MRI imaging ||T9l. Emerging applications of compressed sensing in 
data mining and computer vision were described by Candes in a plenary lecture at the 2014 
International Congress of Mathematicians |Z]- 

The central problems in compressed sensing can be framed in terms of linear algebra. In this 
model, a signal is a vector v in some high-dimensional vector space, R^. The sampling process 
can be described as multiplication by a specially chosen n x N matrix O, called the sensing matrix. 
Typically we will have n <C N, so that the problem of recovering v from Or; is massively under¬ 
determined. 

A vector is k-sparse if it has at most k non-zero entries. The set of k-sparse vectors in R^ plays 
the role of the set of compressible signals in a communication system. The problem now is to 
find necessary and sufficient conditions so that the inverse problem of finding v given O and Or? 
is efficiently solvable. 

If u and V are distinct k-sparse vectors for which Ow = Or?, then one of them is not recoverable. 
Clearly, therefore, we require that the images of all k-sparse vectors under O are distinct, which 
is equivalent to requiring that the null-space of O not contain any 2k-sparse vectors. There is no 
known polynomial time algorithm to certify this property. We refer to the problem of finding 
the sparsest solution x to the linear system Ox = Ox as the sparse recovery problem. Natarajan has 
shown that certain instances of this problem are NP-hard Il27l . 

Compressed sensing(CS) can be regarded as the study of methods for solving the sparse 
recovery problem and its generalizations {e.g., sparse approximations of non-sparse signals, so¬ 
lutions in the presence of noise) in a computationally efficient way. Most results in CS can be 
characterized either as certifications that the sparse recovery problem is solvable for a restricted 
class of matrices, or as the development of efficient computational methods for sparse recovery 
for some given class of matrices. 

One of the most important early developments in CS was a series of results of Candes, 
Romberg, Tao and their collaborators. They established fundamental constraints for sparse re¬ 
covery: one cannot hope to recover k-sparse signals of length N in less than 0(klog N) measure¬ 
ments under any circumstances. (For k = 1, standard results from complexity theory show that 
0(log N) measurements are required.) The main tools used to prove this result are the restricted 
isometry parameters (RIP), which measure how the sensing matrix O distorts the £ 2 -riorm of sparse 
vectors. Specifically, O has the RIP(k,e) property if, for every k-sparse vector v, the following 
inequalities hold: 

{l-e)\v\l < \(^v\l < (l + e)|i?|^. 

Tools from Random Matrix Theory allow precise estimations of the RIP parameters of certain 
random matrices. In particular, it can be shown that the random Gaussian ensemble, which has 
entries drawn from a standard normal distribution, is asymptotically optimal for CS. A slightly 
weaker result is known for the random Fourier ensemble, a random selection of rows from the 
discrete Fourier transform matrix llsll^l^. 

As well as providing examples of asymptotically optimal CS matrices, Candes et al. provided 
an efficient recovery algorithm: they showed that, under modest additional assumptions on the 
RIP parameters of a matrix, -minimization can be used for signal recovery. Thus efficient signal 
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recovery is possible in large systems, making applications to real world problems feasible. 

Generation and storage of truly random matrices are potential obstacles to implementations 
of CS. It is also difficult to design efficient signal recovery algorithms capable of exploiting the 
structure of a random matrix. For implementation in real-world systems, it is desirable that CS 
constructions produce matrices that are sparse (possess relatively few non-zero entries), structured, 
and deterministically constructed. Systems with these properties can be stored implicitly, and 
efficient recovery algorithms can be designed to take advantage of their known structure. If O is 
n X N with d non-zero entries per column, then computing Or; takes 0{dN) operations, which is 
a significant saving when d <C n. In some applications, signals are frequently subject to rank-one 
updates {i.e. v is replaced by z; -|- acj), in which case the image vector can be updated in time 
0{d), cf. IMI- 

Motivated by real-world applications, a number of papers have explored CS where the Gaus¬ 
sian ensemble is replaced by a sparse random matrix {e.g. coming from an expander graph or 
an LDPC code) |l3H3|l8l, or by a matrix obtained from a deterministic construction lUTlIT^ITbl . 
But to date, constructions meeting all three criteria have either been asymptotic in nature {i.e. 
the results only produce matrices that are too large for practical implementations), or are known 
only to exist for a very restricted range of parameters. This investigation was inspired by work 
of the second author on constructions of sparse CS matrices from pairwise balanced designs and 
complex Hadamard matrices ||6H3. Some related work on constructing CS matrices from finite 
geometry is contained in Il20ll37l . 

In this paper we take a new approach. Rather than constructing a sparse matrix and exam¬ 
ining its CS properties, we begin with a matrix which is known to possess good CS properties 
(with high probability) and explore the effect of sparsification on this matrix. That is, we set 
many of the entries in the original matrix to zero, and compare the performance of the sparse 
matrix with the original. Results of Guo-Baron-Shamai suggest that sparse matrices should be¬ 
have similarly to dense matrices in our regime HTTI . Surprisingly, we observe an improvement in 
signal recovery as the sparsity increases. 

First we survey some previous work on sparse compressed sensing matrices. Then in Section 
1^ we give a formal definition of sparsification, and describe algorithms used to generate random 
matrices and random vectors as well as the recovery algorithms. In Section |4] we describe the 
results of extensive simulations. These provide substantial computational evidence which sug¬ 
gests that sparsification is a robust phenomenon, providing benefits in both recovery time and 
proportion of successful recoveries for a wide range of random and structured matrices occur¬ 
ring in the CS literature. In particular. Table 14.21 shows the benefits of sparsification for a range 
of matrix constructions, while Figure |2] illustrates how sparsification improves performance for a 
range of CS recovery algorithms. Finally in Section we conclude with some observations and 
open questions motivated by our numerical experiments. 

2 Tradeoffs between sparsity and compressed sensing 

A number of authors have investigated methods of replacing random ensembles with more 
computationally tractable sensing matrices. As previously mentioned, foundational results of 
Candes-Romberg-Tao establish asymptotically sharp results: to recover signals of length N with 
k non-zero entries, n = 0(klogN) measurements are necessary. Work of Chandar established 
that when n = ©(klog N), then the columns of O must contain at least 0(min{k, N/n}) non-zero 
entries fTOl . In ||2^ , Nelson-Nguyen establish an essentially optimal result when n = 0(fclog N) 
and k < N/ log^ N. They show that each column of O necessarily contains 0(fclog N) non-zero 
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entries; i.e. the proportion of non-zero entries in O cannot tend to zero. 

Observe that some restriction on /c as a function of N is necessary: in the limiting case k = 
N, the identity matrix clearly suffices as a sparse sensing matrix. Furthermore, combinatorial 
constructions of sparse matrices are known which have near optimal MIP recovery guarantees 
[i6i|. In such matrices n = ©{k^), and for certain infinite families of matrices (e.g. those coming 
from projective planes) the number of non-zero entries in each column is ©{k). Results bounding 
errors in the norm (so- called RIP-1 guarantees) have been obtained using expander graphs. In 
particular, Bah-Tanner have shown that essentially optimal RIP-1 recovery can be achieved when 
lim„^.oo N/n = a. for some fixed a, with a constant number of non-zero entries per column [Tl. 
(See also the discussion of dense vs sparse matrices in Section III of this paper.) These bounds 
are strictly weaker than RIP-2 bounds, though fast specialised algorithms have been developed 
for signal recovery with such matrices 1321 . 

Since the k-RIP property is difficult to establish in practice, some authors have relaxed this 
in various directions. Berinde-Gilbert-Indyk-Karloff-Strauss |31| considered random binary matri¬ 
ces with constant column sum and related these to the incidence matrices of expander graphs. 
We reinterpret these matrices as sparsifications of the all-ones matrix below. Sarvotham-Baron- 
Baraniuk and Dimakis-Smarandache-Vontobel l2l lT2l have considered the use of LDPC matrices. 
In particular, they have provided a strong correspondence between error-correcting performance 
of LDPC codes (when considered over F 2 ) and CS performance of the same binary matrices 
(when considered over R). While both groups obtained essentially optimal CS performance 
guarantees, their constructions are limited by the lack of known explicit constructions for ex¬ 
pander graphs and LDPC codes respectively. Moghadam-Radha have previously considered a 
two step construction of sparse random matrices, involving construction of a random zero-one 
matrix followed by replacing each one with a sample from some probability distribution, l26ll25l . 

If one is content with recovery of each sparse vector with high probability, then much sparser 
matrices become useful. A strong result in this direction is due to Gilbert-Li-Porat-Strauss, who 
show that there exist matrices with n = 0(klog N) rows and 0(log^ klog N) non-zero entries per 
column which recover sparse vectors with probability 3/4 Ifl^ (c/. 11341 1. Their matrices also come 
with efficient encoding, updating and recovery algorithms. While essentially optimal results are 
known for sparsity bounds on CS matrices with an optimal number of rows, much less is known 
when either some redundant rows are allowed in the construction, or when RIP is replaced with 
a slightly weaker condition. 

Several authors have compared the performance of sparse and dense CS matrices HZl IMl SSI 
[2T]| .: Guo-Baron-Shamai have essentially shown that in certain limiting cases of the sparse recov¬ 
ery problem, that dense and sparse sensing matrices behave in a surprisingly similar manner. In 
particular, they consider a variant of the recovery problem: given O and Ox, what can one say 
about any single component of x? They show that, as the size of the system becomes large (in a 
suitably controlled way), the problem of estimating x, becomes independent of estimating xy. In 
fact, the problem is equivalent to recovering a single measurement of x, contaminated by additive 
Gaussian noise. They also apply their philosophy to sparse matrices, where as the size of the 
matrix becomes large, estimation of all signal components becomes independent, and each can 
be recovered independently. We refer the reader to the original paper for technical details. As a 
result, under their assumptions, there should be no essential difference between CS performance 
in the sparse and dense cases. 

The Guo-Baron-Shamai definition of sparsity is interesting, as it illustrates some of the sub¬ 
tleties that occur in this area. As usual, N is the number of columns, n the number of rows. They 
introduce a parameter q such that qN —t co but qN‘^ —)■ 0 for any a < 1. So q = log(N)/N would 
work, for example. Then q is the proportion of non-zero entries in O. Guo-Baron-Shamai assume 
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that N = ocn for some a. Then the number of non-zero entries in a column would be O(logn). 
Guo-Baron-Shamai claim that such a matrix performs identically to a dense matrix: in particu¬ 
lar, there exist sparse matrices which recover vectors of sparsity 0{n/ logn)). This appears to 
be in conflict with Nelson-Nguyen: their result requires at least 0(n log logn/logn) non-zero 
entries in such a matrix. The difference is that Nelson-Nguyen holds only when n = 0(lclog^ N), 
whereas Guo-Baron-Shamai consider the case that n = 0{N). 

Denoting by 5 the probability that an entry of O is non-zero, the analysis of Wang-Wainwright- 
Ramchandran depends on the quantity {1 — 3)k, which can be considered a measure of how much 
information about x is captured in each co-ordinate of Ox. The main result is an asymptotic anal¬ 
ysis of how (1 — is related to n, the number of measurements required for signal recovery: 
they show that if {1 — 3)k —)■ co (which corresponds to relatively dense matrices), then sparsifi- 
cation has no effect on recovery, while if(l — ^i)lc—>-0, then what the authors term dramatically 
more measurements are required. We refer the reader to the original paper for more details ||3^ . 

Our simulations are close in spirit to those considered by Guo-Baron-Shamai. Our compu¬ 
tations are rather surprising as they suggest a modest improvement in signal recovery as we 
apply a sparsifying process to certain families of GS matrices. This improvement seems to persist 
across different recovery algorithms and different matrix constructions, and does not appear to 
have been noted in any of the work discussed in this section. (Though Lu-Li-Kpalma-Ronsin 
have observed some improvement in GS performance for sparse binary matrices f22l .l We also 
observed a substantial improvement in the running times of the recovery algorithms, which may 
be of substantial interest in practical applications. 

3 Sparsification 

We begin with a formal definition of sparsification. 

Definition 1. The matrix O' is a sparsification of O if O)^ = O^ for every non-zero entry of O'. 
The density of O, ^(O), is the proportion of non-zero entries that it contains, and the relative 
density of O' is the ratio ^i(0')/^(0). We write Sp(0,s) for the set of all sparsifications of O of 
relative density s. 

In general, we have that Sp(0,1) = O, and that Sp(0,0) is the zero matrix. We also have a 
transitive property: if O' G Sp(0,si) and O" G Sp(0',S2) then O" G Sp(0,S 1 S 2 ). Two indepen¬ 
dent sparsifications will not in general be comparable - there is a partial ordering on the set of 
sparsifications of a matrix, but not a total order. 

We illustrate our notation. Gonsider a Bernoulli random variable which takes value 1 with 
probability p and value 0 with probability 1 — p; let O be an n x N matrix with entries drawn 
from this distribution, in short a Bernoulli ensemble with expected value p. Then the expected 
density of O is p. Writing / for the all-ones matrix, we have O G Sp(/, p). If O' is a random 
sparsification (i.e. all non-zero entries of the matrix have an equal probability to be set to zero) of 
O with relative density p', then O' is easily seen to be a Bernoulli ensemble with expected value 
pp'. So we have both O' G Sp(0, p') and O' G Sp(/, pp')- 

Bernoulli ensembles have previously been considered in the GS literature, see ISTI for exam¬ 
ple, though note that the matrices here take values in {0,1}, not {±1}. Such {±l}-matrices are 
an affine transformation of ours: M' = 2M — /; as a result, GS performance of either matrix is 
essentially the same. 

In this paper, we will mostly be interested in pseudo-random sparsifications of an n x N com¬ 
pressed sensing matrix O. Specifically, for s = t/n, we obtain a matrix O' G Sp(0, s) by generat¬ 
ing a pseudo-random {0, l}-matrix S with sn randomly located ones per column, and returning 
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the entry-wise product O' = O * S. We will generally re-normalize O' so that every column has 
unit ^ 2 -riorm. 

Given a matrix O, we test its CS performance by running simulations. Since many different 
methodologies occur in the literature, we specify ours here. 

Our fc-sparse vectors always contain exactly k non-zero entries, in positions chosen uniformly 
at random from the (^) possible supports of this size. The entries, unless otherwise specified, are 
drawn from a uniform distribution on the open interval (0,1). The vector is then scaled to have 
unit £ 2 -riorm. Simulations where the non-zero entries were drawn from the absolute value of a 
Gaussian distribution produced similar results. Note that many authors use (0,1)- or (0, ±1)- 
vectors for their simulations. Appropriate combinations of matrices and algorithms often exhibit 
dramatic improvements of performance on this restricted set of signals. 

We recover signals using £i-minimization. In this paper we will use the matlab LP-solver and 
the implementations of Orthogonal Matching Pursuit (OMP) and Compressive Sampling Matching 
Pursuit (GoSaMP) algorithms developed by Needell and Tropp Il28l . Specifically, given a matrix 
O and signal vector x, we compute y = Ox, and solve the -minimization problem Ox = y for 
X. The objective function is the £i-norm of x and it is assumed that all variables are non-negative. 
We consider the recovery successful if |x — x|i < c for some constant c. We take c = 10^^ in all 
the simulations presented in this paper. 

We conclude this section with an example illustrating the potential benefits of sparsification. 
In Figure [TJ we explore the effect of sparsification on a 200 x 2000 matrix O with entries uni¬ 
formly distributed on (0,1). The results for this case were compared with matrices drawn from 
Sp(O,0.1) and Sp(^,0.05). For each signal sparsity between 1 and 60, we generated 500 random 
vectors as described above and recorded the number of successful recoveries using the matlab 
LP-solver. To avoid bias we generated a new random matrix for each trial. 



Figure 1: Signal recovery comparison of Sp(0,1), Sp(O,0.1) and Sp(O,0.05) 

We observe that for signal vector sparsities between 45 and 55, matrices in Sp(0,0.05) achieve 
substantially better recovery than those from Sp(0,1). The code used to generate this simulation 
as well as others in this paper is available in full, along with data from multiple simulations at a 
webpage dedicated to this project: http://fintanhegarty.com/compresseci_sensing.html . 
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4 Results 


Our simulations produce large volumes of data. To highlight the interesting features of these 
data-sets, we propose the following measure for acceptable signal recovery in practice. 

Definition 2. For a matrix ^ and for 0 < f < 1, we define the t-recovery threshold, denoted Rt, to 
be the largest value of k for which O recovers fc-sparse signal vectors with probability exceeding 

t. 

We construct an estimate Rt for Rt by running simulations. As the number of simulations 
that we run increases, Rt converges to Rt- In practice this convergence is rapid. The definition 
of Rt generalizes naturally to a space of matrices (say n x N Gaussian ensembles): it is simply 
the expected value of Rt for a matrix chosen uniformly at random from the space. To estimate 
Rt with reasonably high confidence, we proceed as follows: begirming with signals of sparsity 
k = 1, we attempt 50 recoveries. We increment the value khy 1 and repeat until we reach the 
first sparsity ko where less than 50f signals are recovered. Beginning at ko — 3, we attempt 200 
recoveries at each signal sparsity. When we reach a signal sparsity ki where less than 200f signals 
are recovered, we attempt 1000 signal recoveries at each signal sparsity starting at fci — 3. When 
we reach a signal sparsity k 2 where less than l,000f signals are recovered, we set Rt = k 2 — 1. 

We typically find that ki = k 2 , which gives us confidence that Rt = Rt- Unless otherwise 
specified, we use the assumptions outlined in Section |3l 

4.1 Recovery algorithms with sparsification 

As suggested already in Figured taking O' G Sp(0,s) for some value of s ~ 0.05 seems to offer 
considerable improvements when using linear programming for signal recovery. Similar results 
also hold for OMP and CoSaMP, though note that in each case we supply these algorithms with 
the sparsity of the signal vector. (While there is an option to withhold this data, the recovery 
performance of CoSaMP seems to suffer substantially without it - and we wish to be able to 
perform comparisons with linear programming.) In Figure |2l we graph Ro.98 of Sp(0, s) as a 
function of s, where O is a 200 x 2000 matrix with entries drawn from the absolute values of 
samples from a standard normal distribution. 



Figure 2: Signal recovery as a function of matrix density for LP, OMP and CoSaMP 


For each algorithm, Ro.98 appears to obtain a maximum for matrices of density between 0.12 
and 0.04. It is perhaps interesting to note that the percentage improvement obtained by CoSaMP 
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is far greater than that for either of the other algorithms. Linear programming is an order of 
magnitude slower than either of the other algorithms for these parameters in the dense case, but 
has comparable running time on challenging instances the sparse problem, as illustrated by Table 

IQ 



Time taken for 100 recovery vectors 

% of vectors successfully recovered 

k 

GoSaMP 


LP 

GoSaMP 


LP 


<i = 0.1 

= 1 

J = 0.1 

S = 1 

3 = 0.1 

3 = 1 

(5 = 0.1 

3 = 1 

1 

0.94 

0.44 

18.1 

106.61 

100 

100 

100 

100 

10 

0.78 

1.25 

39.78 

157.53 

100 

100 

100 

100 

20 

0.56 

1.98 

27.50 

177.17 

100 

100 

100 

100 

30 

1.56 

4.48 

27.39 

171.84 

100 

99 

100 

100 

40 

3.68 

33.98 

27.02 

207.22 

100 

55 

99 

94 

50 

11.09 

66.77 

33.59 

375.17 

99 

7 

78 

38 

60 

48.38 

81.54 

43.82 

364.25 

75 

0 

1 

1 

70 

89.87 

93.62 

41.03 

329.22 

0 

0 

0 

0 


Table 1: Effect of sparsification on recovery time 


Table l4T] shows the average time taken for one hundred vector recovery attempts using 200 x 
2000 measurement matrices with entries drawn from the absolute values of samples from a 
normal distribution, over a range of vector sparsities. We observe an improvement in running 
time of an order of magnitude for linear programming when using sparsified matrices, and an 
improvement when using CoSaMP. 

4.2 Matrix constructions under sparsification 

In this section we expolore the effect of sparsification on a number of different constructions 
proposed for CS matrices. We have already encountered the Gaussian, Uniform and Bernoulli 
ensembles. We will also consider some structured random matrices, which still have entries drawn 
from a probability distribution, but the matrix entries are no longer independent. The partial 
circulant ensemble Il30l consists of rows sampled randomly from a circulant matrix, the first row 
of which contains entries drawn uniformly at random from some suitable probability distribu¬ 
tion. Table l4T] compares Ro .98 for Sp(4>, 1) and Sp(O,0.05) for 200 x 2000 matrices from each 
of the classes listed. Note that in the case of the Bernoulli ensemble, we actually compare 
Sp(7200,2000/0-5) with Sp(/ 2 oo ,2000 / 0-05), where / 200,2000 is an all-ones matrix. The entries of the 
partial circulant matrix were drawn from a normal distribution. 

We denote by k the signal sparsity k for which the greatest difference in recovery between 4> 

Finally, we investigate the effect of sparsification on matrices of varying parameters. In particular, 
we explore the effect of sparsification on a family of matrices with entries drawn from the abso¬ 
lute value of the Gaussian distribution. First we explore the effect of sparsification as the ratio 
of columns to rows in the sensing matrix increases. For this graph, we use signal vectors whose 
entries were drawn from the absolute value of the normal distribution. We observe a modest 
improvement in performance which appears to persist. 
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Construction 

S = 1 

Ko.98 

= 0.05 

Maximal performance difference 
k 3 = 1 3 = 0.05 

Normal 

39 

46 

51 

25 

81 

Uniform 

39 

45 

51 

24 

73 

Bernoulli 

39 

42 

49 

38 

67 

Partial Circulant 

39 

46 

52 

22 

76 


Table 2: Benefit of sparsification for different matrix constructions 



Figure 3: Recovery capability of matrices with 100 rows and varying number of columns under 
sparsification 


Now we fix the ratio of columns to rows of O to be 10, and vary the number of rows. We 
know from the results of Candes et al, that Ro .98 = 0{n/ log(n)) in all cases. Nevertheless, the 
clear difference in slopes for recovery at different sparsities offers compelling evidence that the 
benefits of sparsification persist for large matrices. 



Figure 4: Recovery with CoSaMP for matrices with fixed row to column ratio under sparsification 
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5 Conclusion 


Some of the most important open problems in compressed sensing relate to the development 
of efficient matrix constructions and effective algorithms for sparse recovery. Deterministic con¬ 
structions are essentially limited by the Welch bound: using known methods it is not possible 
to guarantee recovery of vectors of sparsity exceeding 0{^/n), where n is the number of rows 
in the recovery matrix (see e.g. [61). Probabilistic constructions are much better: the Candes- 
Tao theory of restricted isometry parameters allows the provable recovery of vectors of sparsity 
k in dimension N with 0(fclog(N)) measurements. Such guarantees hold with overwhelming 
probability for Gaussian ensembles and many other classes of random matrices. But the random 
nature of these matrices can make the design of efficient recovery algorithms difficult. In this 
paper we have demonstrated that sparsification offers potential improvements for computational 
compressed sensing. In particular. Figures [T] and Figure |4] show that sparsification results in 
the recovery of vectors of higher sparsity. Table 14.11 shows a substantial improvement in run¬ 
times for linear programming arising from sparsification. These appear to be robust phenomena, 
which persist under a variety of recovery algorithms and matrix constructions. At the problem 
sizes that we explored, matrices with densities between 0.05 and 0.1 seemed to provide optimal 
performance. 

We conclude with a small number of observations and conjectures which we believe to be 
suitable for further investigation. Since a Bernoulli ensemble in our terminology can be regarded 
as a sparsification of the all-ones matrix, it is clear that sparsification can improve CS perfor¬ 
mance. The necessary decay in CS performance as the density approaches zero shows that the 
effect of sparsification cannot be monotone. Extensive simulations suggest that when recovery is 
achieved with a general purpose linear programming solver, matrices with approximately 10% 
non-zero entries have substantially better CS properties than dense matrices. A catastrophic de¬ 
cay of compressed sensing performance occurs in many of the examples we investigated between 
densities of 0.05 and 0.01. We pose two questions which we think suitable for further research. 

Question 1: As the number of rows in O increases, the optimal matrix density appears to decrease. Is 
there a function T{n,k,N) of the matrix parameters which describes the optimal level of 
sparsification? We propose that the asymptotics of T may be asymptotically independent 
of the matrix construction and of the recovery algorithm. We conjecture that when N < n“ 
that the optimal density of a CS matrix will be approximately a.n^^ when k = 

Question 2: We have considered pseudo-random sparsifications in this paper. In general, this should 
not be necessary. Are there deterministic constructions for (0,1)-matrices with the property 
that their entry-wise product with a CS matrix improves CS performance? A natural class 
of candidates would be the incidence matrices of t-{v,k,\) designs (see |H for example). 
Some related work is contained in [SJISI. 
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